Search CORE

27 research outputs found

Testing bounded arboricity

Author: Eden Talya
Levi Reut
Ron Dana
Publication venue
Publication date: 16/07/2017
Field of study

In this paper we consider the problem of testing whether a graph has bounded arboricity. The family of graphs with bounded arboricity includes, among others, bounded-degree graphs, all minor-closed graph classes (e.g. planar graphs, graphs with bounded treewidth) and randomly generated preferential attachment graphs. Graphs with bounded arboricity have been studied extensively in the past, in particular since for many problems they allow for much more efficient algorithms and/or better approximation ratios. We present a tolerant tester in the sparse-graphs model. The sparse-graphs model allows access to degree queries and neighbor queries, and the distance is defined with respect to the actual number of edges. More specifically, our algorithm distinguishes between graphs that are

\epsilon

-close to having arboricity

\alpha

and graphs that

c \cdot \epsilon

-far from having arboricity

3\alpha

, where

c

is an absolute small constant. The query complexity and running time of the algorithm are

\tilde{O}\left(\frac{n}{\sqrt{m}}\cdot \frac{\log(1/\epsilon)}{\epsilon} + \frac{n\cdot \alpha}{m} \cdot \left(\frac{1}{\epsilon}\right)^{O(\log(1/\epsilon))}\right)

where

n

denotes the number of vertices and

m

denotes the number of edges. In terms of the dependence on

n

and

m

this bound is optimal up to poly-logarithmic factors since

\Omega(n/\sqrt{m})

queries are necessary (and

\alpha = O(\sqrt{m}))

. We leave it as an open question whether the dependence on

1/\epsilon

can be improved from quasi-polynomial to polynomial. Our techniques include an efficient local simulation for approximating the outcome of a global (almost) forest-decomposition algorithm as well as a tailored procedure of edge sampling

arXiv.org e-Print Archive

Crossref

MPG.PuRe

On Sampling Edges Almost Uniformly

Author: Eden Talya
Rosenbaum Will
Publication venue: OASIcs - OpenAccess Series in Informatics. 1st Symposium on Simplicity in Algorithms (SOSA 2018)
Publication date: 29/06/2017
Field of study

We consider the problem of sampling an edge almost uniformly from an unknown graph, G = (V, E). Access to the graph is provided via queries of the following types: (1) uniform vertex queries, (2) degree queries, and (3) neighbor queries. We describe a new simple algorithm that returns a random edge e in E using tilde{O}(n/sqrt{eps m}) queries in expectation, such that each edge e is sampled with probability (1 +/- eps)/m. Here, n = |V| is the number of vertices, and m = |E| is the number of edges. Our algorithm is optimal in the sense that any algorithm that samples an edge from an almost-uniform distribution must perform Omega(n/sqrt{m}) queries

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Lower Bounds for Approximating Graph Parameters via Communication Complexity

Author: Eden Talya
Rosenbaum Will
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2018)
Publication date: 01/01/2018
Field of study

In a celebrated work, Blais, Brody, and Matulef [Blais et al., 2012] developed a technique for proving property testing lower bounds via reductions from communication complexity. Their work focused on testing properties of functions, and yielded new lower bounds as well as simplified analyses of known lower bounds. Here, we take a further step in generalizing the methodology of [Blais et al., 2012] to analyze the query complexity of graph parameter estimation problems. In particular, our technique decouples the lower bound arguments from the representation of the graph, allowing it to work with any query type. We illustrate our technique by providing new simpler proofs of previously known tight lower bounds for the query complexity of several graph problems: estimating the number of edges in a graph, sampling edges from an almost-uniform distribution, estimating the number of triangles (and more generally, r-cliques) in a graph, and estimating the moments of the degree distribution of a graph. We also prove new lower bounds for estimating the edge connectivity of a graph and estimating the number of instances of any fixed subgraph in a graph. We show that the lower bounds for estimating the number of triangles and edge connectivity also hold in a strictly stronger computational model that allows access to uniformly random edge samples

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Approximately Counting Triangles in Sublinear Time

Author: Eden Talya
Levi Amit
Ron Dana
Seshadhri C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/09/2015
Field of study

We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a {\em sublinear-time\/} algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph. The allowed queries are degree queries, vertex-pair queries and neighbor queries. We show that for any given approximation parameter

0<\epsilon<1

, the algorithm provides an estimate

\widehat{t}

such that with high constant probability,

(1-\epsilon)\cdot t< \widehat{t}<(1+\epsilon)\cdot t

, where

t

is the number of triangles in the graph

G

. The expected query complexity of the algorithm is

\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right)\cdot {\rm poly}(\log n, 1/\epsilon)

, where

n

is the number of vertices in the graph and

m

is the number of edges, and the expected running time is

\!\left(\frac{n}{t^{1/3}} + \frac{m^{3/2}}{t}\right)\cdot {\rm poly}(\log n, 1/\epsilon)

. We also prove that

\Omega\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right)

queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in

n

(and the dependence on

1/\epsilon

).Comment: To appear in the 56th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2015

arXiv.org e-Print Archive

Crossref

Provable and practical approximations for the degree distribution using sublinear graph samples

Author: Eden Talya
Jain Shweta
Pinar Ali
Ron Dana
Seshadhri C.
Publication venue
Publication date: 01/01/2018
Field of study

The degree distribution is one of the most fundamental properties used in the analysis of massive graphs. There is a large literature on graph sampling, where the goal is to estimate properties (especially the degree distribution) of a large graph through a small, random sample. The degree distribution estimation poses a significant challenge, due to its heavy-tailed nature and the large variance in degrees. We design a new algorithm, SADDLES, for this problem, using recent mathematical techniques from the field of sublinear algorithms. The SADDLES algorithm gives provably accurate outputs for all values of the degree distribution. For the analysis, we define two fatness measures of the degree distribution, called the

h

-index and the

z

-index. We prove that SADDLES is sublinear in the graph size when these indices are large. A corollary of this result is a provably sublinear algorithm for any degree distribution bounded below by a power law. We deploy our new algorithm on a variety of real datasets and demonstrate its excellent empirical behavior. In all instances, we get extremely accurate approximations for all values in the degree distribution by observing at most

1\%

of the vertices. This is a major improvement over the state-of-the-art sampling algorithms, which typically sample more than

10\%

of the vertices to give comparable results. We also observe that the

h

and

z

-indices of real graphs are large, validating our theoretical analysis.Comment: Longer version of the WWW 2018 submissio

arXiv.org e-Print Archive

Crossref

Sublinear-Time Distributed Algorithms for Detecting Small Cliques and Even Cycles

Author: Eden Talya
Fiat Nimrod
Fischer Orr
Kuhn Fabian
Oshman Rotem
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Distributed Computing (DISC 2019)
Publication date: 01/01/2019
Field of study

In this paper we give sublinear-time distributed algorithms in the CONGEST model for subgraph detection for two classes of graphs: cliques and even-length cycles. We show for the first time that all copies of 4-cliques and 5-cliques in the network graph can be listed in sublinear time, O(n^{5/6+o(1)}) rounds and O(n^{21/22+o(1)}) rounds, respectively. Prior to our work, it was not known whether it was possible to even check if the network contains a 4-clique or a 5-clique in sublinear time. For even-length cycles, C_{2k}, we give an improved sublinear-time algorithm, which exploits a new connection to extremal combinatorics. For example, for 6-cycles we improve the running time from O~(n^{5/6}) to O~(n^{3/4}) rounds. We also show two obstacles on proving lower bounds for C_{2k}-freeness: First, we use the new connection to extremal combinatorics to show that the current lower bound of Omega~(sqrt{n}) rounds for 6-cycle freeness cannot be improved using partition-based reductions from 2-party communication complexity, the technique by which all known lower bounds on subgraph detection have been proven to date. Second, we show that there is some fixed constant delta in (0,1/2) such that for any k, a Omega(n^{1/2+delta}) lower bound on C_{2k}-freeness implies new lower bounds in circuit complexity. For general subgraphs, it was shown in [Orr Fischer et al., 2018] that for any fixed k, there exists a subgraph H of size k such that H-freeness requires Omega~(n^{2-Theta(1/k)}) rounds. It was left as an open problem whether this is tight, or whether some constant-sized subgraph requires truly quadratic time to detect. We show that in fact, for any subgraph H of constant size k, the H-freeness problem can be solved in O(n^{2 - Theta(1/k)}) rounds, nearly matching the lower bound of [Orr Fischer et al., 2018]

Dagstuhl Research Online Publication Server

Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection

Author: Eden Talya
Ron Dana
Seshadhri C.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Publication date: 01/01/2017
Field of study

We revisit the classic problem of estimating the degree distribution moments of an undirected graph. Consider an undirected graph G=(V,E) with n (non-isolated) vertices, and define (for s > 0) mu_s = 1n * sum_{v in V} d^s_v. Our aim is to estimate mu_s within a multiplicative error of (1+epsilon) (for a given approximation parameter epsilon>0) in sublinear time. We consider the sparse graph model that allows access to: uniform random vertices, queries for the degree of any vertex, and queries for a neighbor of any vertex. For the case of s=1 (the average degree), widetilde{O}(sqrt{n}) queries suffice for any constant epsilon (Feige, SICOMP 06 and Goldreich-Ron, RSA 08). Gonen-Ron-Shavitt (SIDMA 11) extended this result to all integral s > 0, by designing an algorithms that performs widetilde{O}(n^{1-1/(s+1)}) queries. (Strictly speaking, their algorithm approximates the number of star-subgraphs of a given size, but a slight modification gives an algorithm for moments.) We design a new, significantly simpler algorithm for this problem. In the worst-case, it exactly matches the bounds of Gonen-Ron-Shavitt, and has a much simpler proof. More importantly, the running time of this algorithm is connected to the degeneracy of G. This is (essentially) the maximum density of an induced subgraph. For the family of graphs with degeneracy at most alpha, it has a query complexity of widetilde{O}left(frac{n^{1-1/s}}{mu^{1/s}_s} Big(alpha^{1/s} + min{alpha,mu^{1/s}_s}Big)right) = widetilde{O}(n^{1-1/s}alpha/mu^{1/s}_s). Thus, for the class of bounded degeneracy graphs (which includes all minor closed families and preferential attachment graphs), we can estimate the average degree in widetilde{O}(1) queries, and can estimate the variance of the degree distribution in widetilde{O}(sqrt{n}) queries. This is a major improvement over the previous worst-case bounds. Our key insight is in designing an estimator for mu_s that has low variance when G does not have large dense subgraphs

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Sampling Multiple Edges Efficiently

Author: Eden Talya
Mossel Saleet
Rubinfeld Ronitt
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)
Publication date: 01/01/2021
Field of study

We present a sublinear time algorithm that allows one to sample multiple edges from a distribution that is pointwise ?-close to the uniform distribution, in an amortized-efficient fashion. We consider the adjacency list query model, where access to a graph G is given via degree and neighbor queries. The problem of sampling a single edge in this model has been raised by Eden and Rosenbaum (SOSA 18). Let n and m denote the number of vertices and edges of G, respectively. Eden and Rosenbaum provided upper and lower bounds of ?^*(n/? m) for sampling a single edge in general graphs (where O^*(?) suppresses poly(1/?) and poly(log n) dependencies). We ask whether the query complexity lower bound for sampling a single edge can be circumvented when multiple samples are required. That is, can we get an improved amortized per-sample cost if we allow a preprocessing phase? We answer in the affirmative. We present an algorithm that, if one knows the number of required samples q in advance, has an overall cost that is sublinear in q, namely, O^*(? q ?(n/? m)), which is strictly preferable to O^*(q? (n/? m)) cost resulting from q invocations of the algorithm by Eden and Rosenbaum. Subsequent to a preliminary version of this work, T?tek and Thorup (arXiv, preprint) proved that this bound is essentially optimal

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Embeddings and Labeling Schemes for A*

Author: Eden Talya
Indyk Piotr
Xu Haike
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 13th Innovations in Theoretical Computer Science Conference (ITCS 2022)
Publication date: 18/11/2021
Field of study

A* is a classic and popular method for graphs search and path finding. It assumes the existence of a heuristic function h(u,t) that estimates the shortest distance from any input node u to the destination t. Traditionally, heuristics have been handcrafted by domain experts. However, over the last few years, there has been a growing interest in learning heuristic functions. Such learned heuristics estimate the distance between given nodes based on "features" of those nodes. In this paper we formalize and initiate the study of such feature-based heuristics. In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm. We also show that, under natural assumptions, our lower bounds are almost optimal

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server